Back

Cell Genomics

Elsevier BV

Preprints posted in the last 7 days, ranked by how well they match Cell Genomics's content profile, based on 162 papers previously published here. The average preprint has a 0.22% match score for this journal, so anything above that is already an above-average fit.

1
Multi-ancestral GWAS with the VA Million Veteran Program enables functional interpretation of rheumatoid arthritis alleles

Sakaue, S.; Yang, D.; Zhang, H.; Posner, D.; Rodriguez, Z.; Love, Z.; Cui, J.; Budu-Aggrey, A.; Ho, Y.-L.; Costa, L.; Monach, P.; Huang, S.; Ishigaki, K.; Melley, C.; Tanukonda, V.; Sangar, R.; Maripuri, M.; Sweet, S. M.; Panickan, V.; McDermott, G.; Hanberg, J. S.; Riley, T.; Laufer, V.; Okada, Y.; Scott, I.; Bridges, S. L.; Baker, J.; VA Million Veteran Program, ; Wilson, P. W.; Gaziano, J. M.; Hong, C.; Verma, A.; Cho, K.; Huffman, J. E.; Cai, T.; Raychaudhuri, S.; Liao, K. P.

2026-04-23 genetic and genomic medicine 10.64898/2026.04.22.26351423 medRxiv
Top 0.1%
18.1%
Show abstract

Rheumatoid arthritis (RA) is a heritable and common autoimmune condition. To date, most genetic associations were derived from individuals with either European or East Asian ancestries. Here, we applied a multimodal automated phenotyping strategy to define RA and performed a genome-wide association study (GWAS) of RA in the Million Veteran Program (MVP), including underrepresented African American (AFR) and Admixed American (AMR) populations. Meta-analyses with previous RA cohorts identified 152 autosomal genome-wide significant loci, of which 31 were novel. Inclusion of multi-ancestry data dramatically improved fine-mapping resolution. Functional characterization of these loci using single-cell transcriptomic and chromatin data suggested new RA genes such as CHD7 and CD247. We identified underappreciated functional roles of fine-grained immune cell states other than T cells, such as B cell and myeloid cell states. We observed that multi-ancestry polygenic risk scores using our data demonstrated better predictive ability, especially for AFR and AMR populations.

2
Sex stratified analyses enable new genetic insights into brain imaging phenotypes

Zhang, N.; Wang, S.; Fu, J.; Ji, Y.; Liu, N.; Qian, Q.; Xue, H.; Ding, H.; Liang, M.; Qin, W.; Xu, J.; Yu, C.

2026-04-21 genetics 10.64898/2026.04.21.719541 medRxiv
Top 0.1%
12.4%
Show abstract

Sex differences are commonly observed in neuroimaging phenotypes and in the risk of brain diseases, yet the underlying genetic mechanisms remain poorly understood. We investigated sex differences in the genetic architecture of 805 neuroimaging phenotypes in 22,950 males and 22,950 females matched for sample size and covariates, and systematically compared sex-stratified with sex-combined genetic analyses. We found eight variant-trait associations with significant sex differences, 235 fine-mapped sex-dominant causal associations, 457 sex-dominant colocalizations with sex hormones, and 96 sex-dominant colocalizations with schizophrenia. Compared with sex-combined analysis, sex-stratified analysis identified 47 new genetic associations, 170 new fine-mapped causal associations, 1,019 new colocalizations with sex hormones, and 191 new colocalizations with schizophrenia. Additionally, sex-stratified analysis improved global heritability and genetic-correlation estimates and enhanced polygenic prediction for certain phenotypes. This work highlights the need to routinely perform sex-stratified genetic association analyses to elucidate sex-specific and sex-shared genetic control of neuroimaging phenotypes and related disorders.

3
Meta-Analysis of Rare Cancers Leveraging Clinically Ascertained Cohorts Reveals Novel Germline Susceptibility Loci

Carver, S.; Perea-Chamblee, T.; Taraszka, K.; Moon, I.; Yu, X.; Ding, Y.; Carrot-Zhang, J.; Gusev, A.

2026-04-22 genetic and genomic medicine 10.64898/2026.04.16.26350975 medRxiv
Top 0.2%
10.1%
Show abstract

Genome-wide association studies (GWAS) have advanced the understanding of germline susceptibility in common cancers, yet rare malignancies remain underexplored due to limited sample sizes. To address this gap, we conducted large-scale GWAS across 20 rare cancer types and meta-analyzed results from three cohorts: two clinically sequenced cancer center cohorts and an independent population biobank, comprising over 480,000 individuals. We identified nine novel genome-wide significant susceptibility loci with moderate to large effect sizes that replicated across cohorts in eight rare malignancies, including myelodysplastic syndromes (MDS), germ cell tumors, gastrointestinal stromal tumor (GIST), gastrointestinal neuroendocrine tumors, anal cancer (ANSC), non-melanoma skin cancer, mesothelioma, and hepatobiliary cancer. Among the strongest associations were loci in MDS near API5 (OR = 2.21, p = 1.06x10-8), in GIST near SLC6A18 and TERT (OR = 1.91, p = 8.20x10-50), and in ANSC near HLA-DQA2 (OR = 1.58, p = 5.50x10-18). The GIST risk variant was enriched in tumors harboring somatic KIT mutations (OR = 2.21, p = 6.5x10-4) and was associated with worse survival among carriers with KIT-mutant tumors (hazard ratio = 4.06, p = 0.015), implicating germline-somatic interplay in tumor initiation and progression. The ANSC risk variant was associated with HPV infection (OR = 1.44, p = 3.19x10-5), supporting a host-viral interaction in HPV-driven tumorigenesis. The MDS risk variant at the API5 locus was associated with altered neutrophil counts, suggesting a role in hematopoietic dysregulation in disease pathogenesis. We further identified novel, independent associations with mesothelioma, GIST, and hepatobiliary cancer at the 5p15.33 locus encompassing TERT, consistent with pleiotropic genetic effects at a core telomere-maintenance gene. Collectively, these findings demonstrate that integrating clinically ascertained sequencing cohorts with population biobanks substantially enhances germline discovery in rare cancers, enabling identification of high-confidence susceptibility loci and facilitating downstream biological interpretation through linked somatic, viral, and clinical data. This framework provides a scalable approach for characterizing inherited susceptibility across diverse rare malignancies.

4
CalPred yields calibrated intervals for polygenic risk prediction

Shi, Z.; Zhang, Z.; Mandla, R.; Hou, K.; Pasaniuc, B.

2026-04-22 genetic and genomic medicine 10.64898/2026.04.21.26351410 medRxiv
Top 0.4%
7.3%
Show abstract

Polygenic scores (PGS) have emerged as a useful biomarker for stratification of high-risk individuals in genomic medicine, with prediction intervals arising as a principled approach to incorporate statistical uncertainty in their individual-level predictions. In contrast to recent reports by Xu et al7, we show that CalPred6 provides well-calibrated prediction intervals that contain the trait phenotypes at targeted confidence levels. CalPred maintains calibration when PGS performance varies across contextual factors (e.g., ancestry, age, sex, or socio-economic factors) whereas PredInterval7 - a recently introduced method that focuses on marginal calibration across all individuals - exhibits miscalibration.

5
Functional Connectivity of the Neonatal Cerebellum is Impacted by Sex and Polygenic Liability for Autism

Wagner, L.; Chiem, E.; Liu, J.; Hernandez, L. M.

2026-04-19 genetic and genomic medicine 10.64898/2026.04.17.26351076 medRxiv
Top 0.5%
6.7%
Show abstract

The cerebellum rapidly integrates with cerebral networks during infancy and shows consistent structural and functional alterations in Autism Spectrum Disorder (ASD), suggesting that early cerebellar development may be consequential for later behavioral and psychiatric outcomes. Yet, little is known about the effect of ASD genetic liability on cerebello-cerebral functional connectivity in infancy or whether effects may differ by biological sex. Here, we leveraged neonatal functional magnetic resonance imaging, genetic, and behavioral follow-up data from the Developing Human Connectome Project (dHCP) to examine the relationship between ASD polygenic scores (PGS) and functional connectivity of cerebellar regions associated with sensorimotor and social-cognitive functions in 198 term-born neonates (mean age: 9.7 days). We report widespread sex differences in neonatal cerebello-cerebral connectivity that are regionally specific across cerebellar subdivisions. Across the full sample, elevated ASD PGS predicted alterations in cerebello-cerebral connectivity, with hemisphere-dependent differences in sensorimotor cerebellar connectivity with temporal cortex, and hyperconnectivity between the right social-cognitive seed and posterior cingulate. Notably, elevated ASD PGS predicted opposing patterns of cerebello-cerebral connectivity in males and females, including male hyperconnectivity between the right sensorimotor cerebellum and default mode areas, and female hyperconnectivity between the right social-cognitive seed and sensorimotor cortex. Connectivity associated with elevated ASD PGS showed nominal, sex-specific associations with 18-month language ability, attention problems, and emotional reactivity. Our findings show that ASD PGS influences the functional configuration of the cerebellum at birth and suggest that underlying cerebellar connectivity profiles associated with ASD may partially underlie distinct behavioral presentations in males and females.

6
The immune response to childhood vaccines is seasonal

Barrero Guevara, L. A.; Feghali, G.; Kramer, S. C.; Domenech de Celles, M.

2026-04-24 allergy and immunology 10.64898/2026.04.23.26351620 medRxiv
Top 0.5%
6.6%
Show abstract

Vaccination programs worldwide have effectively reduced the burden of childhood diseases, yet immune responses remain highly heterogeneous among individuals. While host characteristics such as age and sex are established determinants of vaccine immunogenicity, the timing of vaccination, specifically the calendar season of vaccination, remains largely underexplored. Although circadian rhythms are known to regulate daily immune function, evidence for long-term circannual patterns has been limited by the difficulty of collecting year-round vaccination data across diverse populations. Here, we show that the season of vaccination systematically shapes the immune response across a broad range of pediatric vaccines. By leveraging data from 96 randomized control trials worldwide, including over 48,000 children vaccinated against 14 pathogens, we demonstrate that immunogenicity after vaccination follows a pronounced latitudinal gradient, typically peaking during colder months in temperate regions and exhibiting distinct variability in the tropics. These findings suggest that the circadian human immune response might extend to a circannual scale, potentially synchronized by environmental cues. Incorporating the season of vaccination into the design of clinical trials and public health campaigns may optimize vaccine performance and enhance seroprotection.

7
A Multi-Omics Computational Pipeline for Systematic Discovery of Retired Self-Antigens as Cancer Vaccine Targets

Wang, V.; Deng, S.; Aguilar, R.

2026-04-22 genetic and genomic medicine 10.64898/2026.04.20.26351288 medRxiv
Top 0.5%
6.5%
Show abstract

BackgroundThe retired antigen hypothesis, introduced by Tuohy and colleagues, proposes that tissue-specific proteins expressed conditionally during early life or reproductive stages, then silenced in normal aging tissue, represent safe and effective cancer vaccine targets when re-expressed in tumors. To date, discovery of retired antigens has relied entirely on hypothesis-driven wet lab work, limiting throughput. MethodsHere we present RADAR (Retired Antigen Discovery and Ranking), a multi-omics computational pipeline implemented on a standard server that systematically identifies retired antigen candidates. RADAR comprises four core discovery layers integrating: 1) The Genotype-Tissue Expression Portal (GTEx) normal tissue expression, 2) TCGA tumor re-expression, 3) DNA methylation, and 4) miRNA regulatory networks, each applied sequentially to identify genes exhibiting the epigenetic and post-transcriptional hallmarks of tissue-specific retirement followed by tumor re-activation. Candidate characterization is further supported by three automated modules: 1) protein-level safety screening via the Human Protein Atlas, 2) molecular subtype enrichment analysis, and 3) cross-cancer confirmation, which execute automatically when the relevant data are available for the selected cancer type. ResultsThe pipeline independently validated known targets including alpha-lactalbumin (LALBA, the basis of the Tuohy Phase 1 triple-negative breast cancer vaccine trial) and anti-Mullerian hormone (AMH), consistent with Tuohys ovarian cancer vaccine program targeting AMHR2, and rediscovered multiple known cancer-testis antigens (MAGEA1, MAGEC1, SSX1) as positive controls. Among 4,664 initial candidates derived from GTEx, the pipeline identified 20 high-confidence retired antigen candidates passing all filters. DCAF4L2, COX7B2, TEX19, and CT83 emerge as the highest-priority novel candidates for experimental validation, demonstrating zero expression in critical somatic organs, strong epigenetic silencing, and significant re-expression across multiple cancer types. ConclusionRADAR provides the first systematic computational framework for retired antigen discovery, offering a reproducible and scalable approach to expanding the cancer immunoprevention pipeline beyond individually characterized targets. The pipeline is fully reproducible, requires no specialized hardware, and is immediately extensible to additional TCGA cancer types.

8
A variance QTL approach to uncover gene-fish oil supplement interaction loci for 14 circulating unsaturated fatty acid traits

Ihejirika, S. A.; Stephen, E.; Ye, K.

2026-04-20 genetic and genomic medicine 10.64898/2026.04.13.26350791 medRxiv
Top 0.5%
6.3%
Show abstract

Gene-environment interactions (GEI) contribute to circulating polyunsaturated fatty acid (PUFA) and monounsaturated fatty acid (MUFA) profiles. GEI may partly explain differences in trait variance across genotype groups. To identify GEI for circulating unsaturated fatty acids, we adopted a two-stage strategy. First, we detected quantitative trait loci associated with trait variance (vQTLs). Second, we tested these vQTLs for interaction with fish oil supplements (FOS). We performed genome-wide vQTL screens for 14 plasma PUFA and MUFA phenotypes in a UK Biobank subset of 200,478 participants. At the genome-wide significance threshold (p < 5.0 x 10-8), we identified 172 vQTL-trait pairs across all 14 traits, and 16 of these vQTLs had no marginal genetic effect on the corresponding trait. We found 46 non-overlapping loci across all phenotypes, with an average of 12 vQTLs per trait. Omega-6% and PUFA% had the most independent vQTLs (N = 24) while DHA% and Omega-3% had the least (N = 1 and 2, respectively). For each of the 172 vQTL-trait pairs, we tested the interaction effect of the vQTL with FOS on the corresponding trait. We found six significant interaction signals in DHA, DHA%, Omega-3, Omega-3%, LA, and Omega-6/Omega-3 ratio around the FADS1/2, ZPR1, and SUGP1/TM6SF2 genes. Our results provide a comprehensive resource of vQTLs and gene-FOS interactions shaping the circulating levels of unsaturated fatty acids.

9
Multi-omic signatures of genetic mechanisms inform on type 2 diabetes biology and patient heterogeneity

Sevilla-Gonzalez, M.; Martinez-Munoz, A. M.; Hanson, P. A.; Hsu, S.; Wang, X.; Smith, K.; Chen, Z.-Z.; Szczerbinski, L.; Kaur, V.; Taylor, K. D.; Wood, A. C.; Mi, M. Y.; Li, H.; Wittenbecher, C.; Gerszten, R. E.; Rich, S.; Rotter, J.; Li, J.; Mercader, J. M.; Manning, A. K.; Shah, R. V. K.; Udler, M.

2026-04-25 endocrinology 10.64898/2026.04.17.26351136 medRxiv
Top 0.6%
6.3%
Show abstract

Type 2 diabetes (T2D) is a heterogeneous disease shaped by genetic pathways related to insulin resistance and beta cell dysfunction, but how this heterogeneity is reflected molecularly remains unclear. We integrated partitioned polygenic scores (pPS) with proteomic and metabolomic profiling to define molecular signatures of T2D and their clinical relevance. We analyzed UK Biobank participants with genomic, proteomic, and metabolomic data. In a disease-free training subset, we used LASSO regression to identify multi-omic signatures associated with each pPS by jointly modeling proteins and metabolites. In an independent testing set, we constructed multi-omic scores and examined their associations with clinical traits and diabetes-related outcomes. Mediation analyses were used to investigate putative causal pathways. Key findings were evaluated in the Multi-Ethnic Study of Atherosclerosis (MESA). We identified distinct multi-omic signatures that capture the molecular architecture of T2D genetic risk across physiological subtypes. Compared with genetic scores alone, multi-omic pPS showed larger effect sizes and better disease discrimination. These scores recapitulated subtype-specific physiology and were associated with T2D risk. The Beta-Cell 2 multi-omic score showed marked stratification for insulin use, which was replicated in MESA, where it also predicted future insulin use. Mediation analyses implicated lipoprotein remodeling and fatty acid metabolism in the Lipodystrophy 1 cluster, accounting for up to 45% of the total effect of pPS on T2D risk. Integrating process-specific genetic risk with circulating multi-omic profiles reveals biologically distinct endotypes of T2D and supports a framework for improved patient stratification and risk assessment.

10
Methylation profiling in the Million Veteran Program: design, quality control, and smoking-associated epigenetic signatures

Schreiner, P. A.; Markianos, K.; Francis, M.; Despard, B.; Gorman, B. R.; Said, I.; Dong, F.; Gautam, S.; Dochtermann, D.; Shi, Y.; Devineni, P.; Kirkpatrick, C.; Khazanov, N.; Moser, J.; Million Veteran Program, ; Huang, G. D.; Muralidhar, S.; Tsao, P. S.; Pyarajan, S.

2026-04-23 genetic and genomic medicine 10.64898/2026.04.22.26351491 medRxiv
Top 0.7%
4.9%
Show abstract

The Million Veteran Program (MVP) represents the largest and one of the most diverse single cohorts associated with longitudinal Electronic Health Record data (EHR) data. We profiled a subset of samples from MVP using the Illumina Infinium MethylationEPIC Beadchip (EPIC array) to generate one of the largest single cohort methylation dataset to-date. Methylation profiles were analyzed for 45,460 total individuals, with the most populous ancestries composed of 27,455 Europeans, 11,798 African Americans, and 4,859 Admixed Americans. We detail the strict quality control standards implemented to ensure the most robust method of methylation profiling of the MVP cohort. This dataset was then applied to evaluate the effects of smoking exposure on DNA methylation in MVP participants. Ancestry-stratified epigenome-wide association studies (EWAS) of smoking status (ever/never) were performed using over 750,000 probes with certifiable signal. Our multi-ancestry meta-analysis demonstrates replicability with existing EWAS and identifies 3,207 novel probe-smoking associations unlocked via the depth and breadth of data in this cohort.

11
From GWAS to drug: A framework for drug candidate prioritisation using a gene expression signature matching approach

Chauquet, S.; Jiang, J.-C.; Barker, L. F.; Hunter, Z. L.; Singh, G.; Wray, N. R.; McRae, A. F.; Shah, S.

2026-04-24 genetic and genomic medicine 10.64898/2026.04.22.26349470 medRxiv
Top 1.0%
4.2%
Show abstract

Drug targets supported by human genetic evidence have significantly higher approval rates, making genome-wide association studies a valuable resource for drug candidate prioritisation. Transcriptome-wide association study signature-matching is an emerging in silico approach that integrates GWAS data with expression quantitative trait loci to generate a disease gene expression signature, which is then compared against drug perturbation databases such as the Connectivity Map. Despite recent adoption, there is no consensus on optimal methodology. Here, we systematically benchmark key parameters, including TWAS method, eQTL tissue model, similarity metric, gene set size, and CMap cell line, using LDL cholesterol, familial combined hyperlipidemia, and asthma as proof-of-concept traits. We demonstrate that while TWAS signature-matching can successfully prioritise known first-line treatments, performance is highly sensitive to parameter choice; for instance, the selection of the cell line used for drug signatures alone can dramatically alter drug prioritisation. Based on these findings, we propose a best-practice framework for robust, genetically-informed drug prioritisation using TWAS signature-matching.

12
Biobank-scale survey of gene-diet interactions informs precision nutrition polygenic scores

Di Scipio, M.; Man, A.; Lali, R.; Wu, J.; Le, A.; Franks, P. W.; Pare, G.

2026-04-20 genetic and genomic medicine 10.64898/2026.04.13.26350340 medRxiv
Top 1%
3.6%
Show abstract

Genome-guided dietary advice is a goal of precision nutrition. However, the contribution of gene-diet interactions (GxDs) to disease risk remains unclear, hindering the identification of diet-outcome pairs more likely amenable to genetic-based recommendations. We thus implemented a two-step approach: first, we comprehensively assessed the contributions of genome-wide GxDs to cardiometabolic outcomes across a broad array of dietary exposures in UK Biobank participants (N = 141,144 to 325,989). Second, we selected the 20 significant diet-outcome pairs from the 713 pairs tested (p < 7.0 x 10-5) and derived GxD polygenic scores. In an independent sample, all scores were nominally associated with their corresponding outcomes, with 12 of 20 polygenic scores Bonferroni significant (p < 0.0025). Further analyses revealed GxD polygenic scores were associated with clinical outcomes such as incident gout, suggesting translational potential. Altogether, these results showcase the promise of GxD scores to inform precision nutrition.

13
Deep Learning Reveals the Modular Genetic Architecture of Cardiovascular Aging

Choi, R. B.; Croon, P. M.; Perera, S.; Oikonomou, E.; Khera, R.

2026-04-24 cardiovascular medicine 10.64898/2026.04.22.26351478 medRxiv
Top 1%
3.6%
Show abstract

Chronological age is a potent determinant of clinical events, but it is conventionally treated as a linear function of time rather than a dynamic process shaped by genetics and tissue-specific senescence. Deep learning models derived from cardiovascular imaging offer an opportunity to quantify biological age across multiple domains and to examine the extent to which these measures capture shared or distinct vulnerabilities. Here, we applied deep learning to estimate biological age from electrocardiograms, cardiac MRI, carotid ultrasound, and retinal imaging, capturing electrical, structural, macrovascular, and microvascular domains in more than 100,000 UK Biobank participants. Genome-wide association and cross-trait heritability analyses showed that cardiovascular aging is not a singular process but a modular phenotype with distinct genetic determinants across modalities. Polygenic risk scores supported these distinct trajectories, showing that different biological age measures capture partly divergent biological processes with corresponding differences in clinical associations. Modality-specific genes also showcased distinct cell-type enrichment patterns. By deconvoluting aging into electrical, structural, macrovascular, and microvascular components, our results demonstrate that AI-derived age metrics capture distinct, disease-specific aging pathways. Ultimately, this modular framework positions deep learning-derived aging models not as holistic measures of health, but as domain-specific biomarkers of cardiovascular vulnerability.

14
A long-read RNA sequencing and polysome profiling framework reveals transposable element-driven transcript diversity and translational rewiring in glioblastoma

Pizzagalli, M.; Sasipalli, S.; Leary, O.; Tran, L.; Haas, B.; Tapinos, N.

2026-04-21 cancer biology 10.64898/2026.04.18.719388 medRxiv
Top 2%
3.5%
Show abstract

BackgroundTransposable elements (TEs) account for over half of the human genome and are often derepressed in cancer. TEs can add cryptic splice sites, undergo exonization, and generate gene-TE fusion transcripts, but the combined effects of TEs on RNA processing and translation in glioblastoma stem cells (GSCs) remains incompletely elucidated. ResultsWe combined long-read RNA sequencing with polysome profiling in four patient-derived GSCs and two neural stem cell (NSC) controls to resolve TE-associated transcript diversity and its relationship to ribosomal engagement. Across GSCs, we identified 13,421 alternative splicing (AS) events, 3,077 of which contained TEs within 150 bp of splice junctions. AS sites proximal to TEs were associated with increased isoform switching compared to non-TE-associated AS sites (odds ratio 2.9 - 4.3). Moreover, AS isoforms generated from TE-proximal sites were more likely to exhibit altered ribosomal association (odds ratio 2.54). Directional shifts were observed, with shorter isoforms associating with monosome fractions and longer isoforms with polysome fractions. To enable systematic detection of gene - TE chimeric transcripts, we developed FuTER (Fusion TE Reporter), a long-read-based framework for identifying TE-associated fusions. Application to GSC datasets identified 78 GSC enriched fusion transcripts, several supported by breakpoint-spanning reads in polysome fractions, consistent with ribosome association. ConclusionsOur data suggest that TEs correlate with abnormal splicing activity and altered ribosome engagement in glioblastoma stem cells. By integrating long-read sequencing with polysome profiling and fusion detection, we establish a framework for analysis of TE-induced transcript diversity and its effects on cancer evolution and plasticity.

15
Ensemble Approaches to Screening, Diagnosis, and Subtyping of Multiple Sclerosis

Yang, I. Y.; Patil, A.; Jin, O.; Loud, S.; Buxhoeveden, S.; Zhang, D. Y.

2026-04-21 genetic and genomic medicine 10.64898/2026.04.19.26351230 medRxiv
Top 2%
3.1%
Show abstract

Multiple sclerosis (MS) is a debilitating disease affecting more than 1 million Americans, and today is assessed primarily through magnetic resonance imaging (MRI) and observational clinical symptoms. Given the autoimmune nature of MS, we hypothesized that high-dimensional gene expression data from peripheral blood mononuclear cells (PBMCs), when analyzed with the assistance of AI, may collectively serve as valuable biomarkers for the real-time risk and progression of MS. Here, we present PBMC RNA sequencing (RNAseq) results from N=997 samples, including 540 MS, 221 neuromyelitis optica (NMO), and 149 healthy controls. We constructed and optimized ensemble models for three clinical outcomes: (1) discrimination of early MS (EDSS [&le;] 2.0) from healthy individuals with 74% AUC at 100% coverage, (2) differential diagnosis of MS from NMO with 91% AUC at 80% coverage, and (3) subtyping RRMS from progressive MS with 79% AUC at 80% coverage. To our knowledge, no prior molecular test has been reported for any of these three MS clinical tasks, and these results may have immediate impact on clinical management of MS patients. Two innovations that improved the stratification accuracy of our models: selection of gene sets based on expression variance in disease states, and use of non-linear rank sort and conviction weighting in the ensemble score calculation.

16
Natural variations of cardiac performance in Drosophila identify a central function for Pdp1/dHLF in cardiac aging

Audouin, K.; Saswati, S.; Roder, L.; Krifa, S.; Arquier, N.; Perrin, L.

2026-04-20 genetics 10.1101/2024.09.30.615759 medRxiv
Top 2%
3.1%
Show abstract

The identification of genetic factors influencing cardiac senescence in natural populations is central to our understanding of cardiac aging and to identify the etiology of associated cardiac disorders in human populations. However, the genetic underpinning of complex traits in human is almost impossible, due to the infeasibility to control genetic background and gene-environment interactions. Drosophila has striking similarities in cardiac aging with humans, highlighting the conserved nature of cardiac aging for organisms with a heart. Leveraging on a large collection of inbred lines from the Drosophila Genetic Reference Panel (DGRP), we provide an accurate analysis of cardiac senescence in a natural population of flies. This permitted the discovery of an unprecedented number of variants and associated genes significantly associated to the natural variation of cardiac aging. We focused on the function of the PAR-domain bZIP transcription factor Pdp1 for which several variants were found associated with natural variation of the aging of multiple cardiac functional traits. We demonstrated that Pdp1 cell autonomously plays a central role in cardiac senescence and might do so by regulating mitochondria homeostasis. Overall, our work provides a unique resource regarding the genetics of cardiac aging in a natural population.

17
On the Edge of Empire: Paleogenomic Insights into Roman Dacia

De Angelis, F.; Buzic, I.; Kassadjikova, K.; Bolog, A. C.; Timofan, A.; Pearce, J.; Gligor, M.; Fehren-Schmitz, L.; G. Amorim, C. E.

2026-04-21 genomics 10.64898/2026.04.18.719386 medRxiv
Top 2%
3.0%
Show abstract

The Roman province of Dacia, located north of the Danube frontier, represented a key zone of cultural and demographic interaction during the Imperial period. However, the biological impact of Roman colonization in this region has not been characterized using genomic data. Here, we analyze genome-wide data from 34 individuals recovered from the Apulum-Dealul Furcilor necropolis, one of the largest funerary complexes in Roman Dacia. The genome-wide data reveal pronounced genetic heterogeneity within this population, reflecting its position at the intersection of Eastern Europe, the Mediterranean, and West Asia. Notably, we observe a sex-biased pattern of ancestry. Female individuals show stronger affinities to Eastern European, Steppe, and Caucasus-associated populations, suggesting the persistence of local or regionally connected genetic lineages. In contrast, male individuals display closer genetic relationships with Mediterranean and North African groups, including populations associated with Roman and Punic contexts, indicating male-mediated gene flow linked to long-distance mobility. These findings highlight the complex demographic processes shaping Roman frontier communities, where local and incoming populations were integrated through asymmetric social dynamics. Our results provide genomic evidence consistent with sex-biased admixture in Roman Dacia and underscore the role of frontier regions as hubs of genetic and cultural interaction within the Roman Empire.

18
Longitudinal Central Adiposity Accumulation is Associated with Cortical Alteration and Impaired Cognitive Function in Adolescents

Zhang, L.; Qiu, B.; Chen, Z.; Xu, X.; Zhao, R.; Chen, Y.; Ning, C.; Chen, R.; Li, M.; Wang, D.; Fu, J.; Wu, D.

2026-04-23 endocrinology 10.64898/2026.04.22.26351453 medRxiv
Top 2%
2.7%
Show abstract

Childhood obesity remains a pressing global health challenge, yet the impact of dynamic adiposity changes during active developmental window retains poorly understood. Leveraging longitudinal data from the Adolescent Brain Cognitive Development (ABCD) Study (N=8519 at baseline; N=1873 at 4-year follow-up), our study reveals distinct neurodevelopmental implications of central fat dynamics during adolescence. At baseline, central fat indices (body roundness index, BRI / waist-to-height ratio, WHtR) outperformed BMI in predicting cognitive deficits, showing robust associations with impaired inhibitory control and episodic memory. The prediction effect was partially mediated by cortical changes in prefrontal and temporal regions. Longitudinally, the rate of fat accumulation ({Delta}) emerged as a critical predictor: faster adiposity accrual predicted attenuated cortical thinning (i.e., slower development) in parietal lobes and poorer executive function at follow-up, while baseline adiposity showed no significant effects on the follow-up brain morphology or cognitive development. Notably, subgroup analyses uncovered that obese adolescents with central fat reduction exhibited accelerated cortical thinning in posterior cingulate (change difference p=0.006-0.029) alongside rapid improvement in inhibitory control (Flanker slope difference p<0.05), whereas those with persistent adiposity showed delayed thinning in the postcentral gyrus. The study reveals that central fat (BRI/WHtR) is closely linked to neurocognitive risks, and longitudinal fat accumulation?rather than baseline adiposity?drives cortical alteration. Notably, fat reduction activated adaptive neural change in obese adolescents, underscoring the importance of weigh regulation during neurodevelopment.

19
Epigenetically constrained astrocyte states underlie prefrontal cortex vulnerability in Down syndrome associated Alzheimer disease

Sun, C.; Thomas, R.; Stringer, C.; Galani, K.; Ho, L.-L.; Sun, N.; Renfro, A.; Wright, S.; Firenze, R.; Tsai, L.-H.; Head, E.; Kellis, M.; Yang, J.

2026-04-21 bioinformatics 10.64898/2026.04.17.719050 medRxiv
Top 2%
2.5%
Show abstract

Down syndrome (DS), caused by trisomy 21, confers a near-universal risk for Alzheimers disease (AD), yet individuals exhibit marked variability in cognitive decline, suggesting the presence of cellular mechanisms that modulate vulnerability and resilience. However, these mechanisms remain poorly defined in the human brain. Here, we integrate matched single-nucleus RNA-seq and ATAC-seq profiles from the prefrontal cortex (PFC) and amygdala (AMY) of age-matched individuals with DS with and without AD (DSAD), enabling direct comparison within a shared genetic background. We identify basal astrocytes in the PFC as a selectively vulnerable cell state in DSAD, characterized by both reduced abundance and coordinated transcriptional and regulatory reprogramming. This state exhibits a shift away from homeostatic support functions, with decreased cytokine signaling and lipid-handling programs, alongside increased steroid- and nuclear receptor-associated activity. Concomitantly, chromatin accessibility profiling reveals reduced engagement of immune- and stress-responsive transcription factor programs, including AP-1, STAT, and BACH families, with linked regulatory perturbations at loci such as ABCA1, DAB2IP, and IL1RAP. Together, these findings define a previously unrecognized astrocyte state marked by epigenetic constraint and diminished responsiveness to stress and inflammatory signals, distinguishing it from classical reactive astrocyte phenotypes. Our results nominate PFC basal astrocytes as a key locus of vulnerability in DSAD and suggest that failure to mount appropriate astrocyte responses, rather than overt activation alone, may contribute to neurodegenerative progression.

20
GenePT Revisited: Do Better Text Embeddings Make Better Gene Embeddings?

Hedley, J. G.; Torr, P. H. S.; Märtens, K.

2026-04-20 genomics 10.64898/2026.04.16.718976 medRxiv
Top 2%
2.4%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWGenePT introduced a simple recipe for gene representations: embed each genes natural-language description with a general-purpose text embedding model and reuse the resulting vectors across downstream tasks. Since GenePTs release, embedding models have improved rapidly, with many strong open and commercial encoders benchmarked on suites such as the Massive Text Embedding Benchmark (MTEB). We present a controlled "leaderboard" study that keeps the GenePT pipeline fixed and varies only the embedding backbone. We benchmark contemporary encoders on four diverse gene embedding tasks: gene-gene interaction prediction, gene property classification, cell type classification, and prediction of transcriptomic responses to unseen genetic perturbations. Across these settings, newer backbones consistently outperform the original GenePT backbone (text-embedding-ada-002), achieving improvements of 1-17%, while enabling fully reproducible research by avoiding API dependencies.